Classification tree analysis using TARGET

نویسندگان

  • J. Brian Gray
  • Guangzhe Fan
چکیده

Treemodels are valuable tools for predictivemodeling and datamining. Traditional tree-growingmethodologies such as CART are known to suffer from problems including greediness, instability, and bias in split rule selection. Alternative tree methods, including Bayesian CART (Chipman et al., 1998; Denison et al., 1998), random forests (Breiman, 2001a), bootstrap bumping (Tibshirani and Knight, 1999), QUEST (Loh and Shih, 1997), and CRUISE (Kim and Loh, 2001), have been proposed to resolve these issues from various aspects, but each has its own drawbacks. Gray and Fan (2003) described a genetic algorithm approach to constructing decision trees called tree analysis with randomly generated and evolved trees (TARGET) that performs a better search of the tree model space and largely resolves the problems with current tree modeling techniques. Utilizing the Bayesian information criterion (BIC), Fan and Gray (2005) developed a version of TARGET for regression tree analysis. In this article, we consider the construction of classification trees using TARGET. We modify the BIC to handle a categorical response variable, but we also adjust its penalty component to better account for the model complexity of TARGET. We also incorporate the option of splitting rules based on linear combinations of two or three variables in TARGET, which greatly improves the prediction accuracy of TARGET trees. Comparisons of TARGET to existing methods, using simulated and real data sets, indicate that TARGET has advantages over these other approaches. © 2007 Elsevier B.V. All rights reserved.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Comparison of Machine Learning Algorithms for Broad Leaf Species Classification Using UAV-RGB Images

Abstract: Knowing the tree species combination of forests provides valuable information for studying the forest’s economic value, fire risk assessment, biodiversity monitoring, and wildlife habitat improvement. Fieldwork is often time-consuming and labor-required, free satellite data are available in coarse resolution and the use of manned aircraft is relatively costly. Recently, unmanned aeria...

متن کامل

Identification of Geochemical Anomalies Using Fractal and LOLIMOT Neuro-Fuzzy modeling in Mial Area, Central Iran

The Urumieh-Dokhtar Magmatic Arc (UDMA) is recognized as an important porphyry, disseminated, vein-type and polymetallic mineralization arc. The aim of this study is to identify and subsequently determine geochemical anomalies for exploration of Pb, Zn and Cu mineralization in Mial district situated in UDMA. Factor analysis, Concentration-Number (C-N) fractal model and Local Linear Model Tree (...

متن کامل

Identification of the most important factors of ethnic differences in anthropometric dimensions of Iranian workers using the decision tree

Background and aims: Anthropometry is the branch of human science that considers the physical measurement of the human body, especially size and shape. One application of anthropometrical data in ergonomics is the design of working space and the development of industrialized products. So that the tools, equipment and workstations, which designed based on the physical dimensions of the workers, ...

متن کامل

Object-Based Classification of UltraCamD Imagery for Identification of Tree Species in the Mixed Planted Forest

This study is a contribution to assess the high resolution digital aerial imagery for semi-automatic analysis of tree species identification. To maximize the benefit of such data, the object-based classification was conducted in a mixed forest plantation. Two subsets of an UltraCam D image were geometrically corrected using aero-triangulation method. Some appropriate transformations were perfor...

متن کامل

Land Cover Classification Using IRS-1D Data and a Decision Tree Classifier

Land cover is one of basic data layers in geographic information system for physical planning and environmentalmonitoring. Digital image classification is generally performed to produce land cover maps from remote sensing data,particularly for large areas. In the present study the multispectral image from IRS LISS-III image along with ancillary datasuch as vegetation indices, principal componen...

متن کامل

Steel Buildings Damage Classification by damage spectrum and Decision Tree Algorithm

Results of damage prediction in buildings can be used as a useful tool for managing and decreasing seismic risk of earthquakes. In this study, damage spectrum and C4.5 decision tree algorithm were utilized for damage prediction in steel buildings during earthquakes. In order to prepare the damage spectrum, steel buildings were modeled as a single-degree-of-freedom (SDOF) system and time-history...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Computational Statistics & Data Analysis

دوره 52  شماره 

صفحات  -

تاریخ انتشار 2008